EN FR
EN FR


Section: New Results

Efficient XML and RDF data management

Participants : Mohamed Amine Baazizi, Nicole Bidoit, Dario Colazzo, François Goasdoué, Konstantinos Karanasos, Asterios Katsifodimos, Julien Leblay, Noor Malla, Ioana Manolescu, Alexandra Roatis, Marina Sahakyan, Federico Ulliana.

Materialized views for XML

We have continued our work on optimizing XML queries through materialized view-based rewriting, implemented within the ViP2P system. We published in IEEE ICDE 2011 an algorithm for rewriting XQuery queries using materialized XQuery view, which improves the state of the art in terms of expressive power of the supported XQuery subset, in collaboration with V. Vasssalos (AUEB, Greece) [41] . Two follow-up works concern: efficient algebraic algorithms for incrementally maintaining the materialized views when the underlying documents change, in collaboration with A. Bonifati (CNR, Italy) [35] , and algorithms for automatically recommending views to materialize for a given XML query workload, with V. Vassalos.

Type-based Update Optimization for XML

XML projection is a well-known optimization technique for reducing memory consumption for XQuery in-memory engines in order to overcome the main-memory limitations of these systems (Galax, Saxon, QizX, and eXist). One of our main research line focuses on a schema-based projection technique for for update optimization. The update language considered is XQuery Update Facility (XUF). The main idea behind this technique is: given a query q over an XML document t, instead of evaluating q over t, the query q is evaluated on a smaller document t ' obtained from t by pruning out, at loading time, parts of t that are irrelevant for q. The queried document t ' , a projection of the original one, is often much smaller than t due to selectivity of queries.

The scenario and type-based projection proposed for XML queries, cannot be applied directly for updates. We have proposed a new scenario which is composed of four steps:

  1. from the update U and the DTD D, a type projector P is inferred;

  2. the document t, valid wrt D, is projected following P in a streaming manner, at loading time;

  3. U is evaluated over the projection P(t) and produces a partial result U(P(t));

  4. the initial document t is merged with U(P(t)), in a streaming manner, at writing (serializing) time in order to produce the final result U(t).

The scenario has been first studied and implemented for a kind of type projector which is a good compromise between simplicity and effectiveness, and corresponding results have been published in [31] . Subsequently, We have improved this technique by designing: (i) a new kind of type projector that minimizes the amount of data kept in the projection, and (ii) a new merge algorithm using the improved type projector. This analysis is complicated by the strong interconnection between the two tasks: while minimizing the projection we need to ensure a fast and correct merge process [17] .

These results have also been presented in [34] , providing an overview on the use of types and constraints from relational to XML data, and in the tutorial [63] focusing on schema-based techniques for safe and efficient XML processing.

XML query-update independence

A query and an update are independent when the query result is not affected by update execution, on any possible input database. Detecting query-update independence is of crucial importance in many contexts: view maintenance, concurrency, access control policies etc. Benefits are amplified when query-update independence can be checked statically. We propose a novel schema-based approach for detecting XML query-update independence. Differently from traditional schema-based analysis for XQuery, our system infers sequence of labels, called chains, that are vertically navigated in each schema instance by query and update paths. More precisely, for each node that can be selected by a query/update path in a schema instance, the system infers a chain recording: a) all labels that are encountered from the root to the selected node, and b) the order in which these labels are traversed. The contextual and ordering information provided these chains is at the basis of and extremely precise static independence analysis. We have devised a quite precise chain-inference system, and devised techniques for efficient implementation of the chain-based independence analysis. Results and experiments concerning this line of research have been recently submitted to an international conference.

Precision and complexity of XQuery type inference

A key feature of XQuery is its type system. Any language expression is statically typed and its type is used during program type-checking. In XQuery, types of input data and functions are defined in terms of regular expression types, but it is quite easy to write queries that generate non-regular languages. As a consequence, any type system for XQuery has to rely on a type inference process that approximates the (possibly non-regular) output type of a query with a regular type. This approximation process, while mandatory and unavoidable, may significantly decrease the precision of the inferred types. In [37] we study and compare in terms of precision and computational complexity two main existing XQuery type systems.

Managing temporal XML documents

The management of temporal data is a crucial issue in many database applications. We are currently investigating efficient storage and update methods for temporal XML documents, with a focus on compactness of the representation. One of the method developed relies on the type based optimization method developed for updates [31] . First results about this line of research are included in recent publications [29] , [30] . Next research activities focus on the design of expressive temporal query and update languages, and on the use of techniques proposed in [29] for query and update optimization.

Materialized view selection for RDF

Syntactically, RDF, the data format of the Semantic Web, resembles relational data. However, RDF query processing is significantly complicated by the irregular nature of RDF data and by its simplistic data model, which leads to syntactically complex queries (involving many joins over the whole triple data set). When a query workload is known, the performance of the workload can be significantly improved by materializing access support data structures such as materialized views. Our efficient algorithms for selecting a set of views to materialize in order to speed up the processing of a set of RDF queries are described in a recent publication [21] . The prototype implementing them has been demonstrated at [53] .

Hybrid models for XML and RDF

We have obtained interesting results in the area of jointly managing XML and RDF data. A first direction of work in this area was to support annotated documents, that is, XML documents where individual nodes or fragments could be annotated with RDF triples. This model allows to capture, for instance, blog comments, user ratings on social sites etc. We have proposed a general model based on W3C standards for modeling such data [39] .

RDF query answering

The current trend for efficiently querying RDF datasets consists of delegating query evaluation to a scalable RDBMS. However, RDF query answering requires in addition to handle – outside the RDBMS – the RDF semantics. We have introduced the database (DB) fragment of RDF, encompassing the popular Description Logic (DL) one with essential RDF features like modeling incomplete information, for which we have devised novel saturation- and reformulation-based techniques for answering the Basic Graph Pattern (BGP) queries of SPARQL. This extends the state of the art on pushing RDF query processing within RDBMSs.